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Abstract. Setcover greedy algorithm is a natural approximation algo- 
rithm for test set problem. This paper gives a precise and tighter analysis 
of approximation guarantee of this algorithm. The author improves the 
performance guarantee 2 In n which derives from set cover problem to 
1.1354 Inn by applying the potential function technique. In addition, 
the author gives a nontrivial lower bound 1.0004471 Inn of performance 
guarantee of this algorithm. This lower bound, together with the match- 
ing bound of information content heuristic, confirms the fact informa- 
tion content heuristic is slightly better than setcover greedy algorithm 
in worst case. 

1 Introduction 

The test set problem is NP-hard. The polynomial time approximation algorithms 
using in practice includes " greedy" heuristics implemented by set cover criterion 
or by information criterionpQ . Test set can not be approximated within (1— e) In n 
for any e > unless NP C ///7 .1/ /•. ; //'" ' " ' " i 2 :', . Recently, the authors of [S\ 
designed a new information type greedy algorithm, information content heuristic 
(ICH for short), and proved its performance guarantee lnrt + 1, which almost 
matches the inapproximability results. 

The setcover greedy algorithm (SGA for short) is a natural approximation 
algorithm for test set. In practice, its average performance is virtually the same 
as information type greedy algorithms jll4j . The performance ratio guarantee 
2 Inn of SGA is obtained by transforming the test set problem as a set cover 
problem. The authors of [5] give the tight performance guarantee 11/8 of SGA 
on instances with the size of tests no greater than 2. 

Oblivious rounding, a derandomization technique to obtain simple greedy 
algorithm for set cover problems by conditional probabilities was introduced in 
|5j. Young observed the number of elements uncovered is an "potential function" 
and the approximation algorithm only need to drive down the potential func- 
tion at each step, thus he showed another proof of the well-known performance 
guarantee In n + 1 . 

In this paper, the author presents a tighter analysis of SGA. We uses the 
potential function technique of [5] to improve the performance guarantee 2 In n 



which derives from set cover problem to 1.1354 Inn, and construct instances to 
give a nontrivial lower bound 1.0004471 Inn of the performance guarantee. The 
latter result confirms the fact ICH is slightly better than SGA in worst case. In 
this analysis, the author refers to the tight analysis of the greedy algorithm for 
set cover problem in [6]. 

In Section 2, the author show the two main theorems, and some definitions, 
notations and facts are given. In Section 3, the author analyzes differentiation 
distribution of item pairs and uses the potential function method to prove the 
improved performance guarantee. In Section 4, the author shows the nontriv- 
ial lower bound by constructing certain test set instances. Section 5 is some 
discussions. 

2 Overview 

The input of test set problem consists of S, a set of items (called universe), 
and T, a collection of subsets (called tests) of S. A test T differentiates item 
pair {i,j} if \T Pi {i,j}\ — 1. T is a test set of S, that is, any item pair of S is 
differentiated by one test in T. The objective is to find T'CT with minimum 
cardinality which is also a test set of S. We use T* to represent the optimal test 
set. Let n — |S|, and m* = |T*|. In this paper, we assume m* > 2. 

In an instance of test set problem, there are (™) different item pairs. Let i,j 
be two different items, and Si, 52 are two disjoint sets. If i,j s Si, we say {i, j} 
is an item pair inside of Si, and if i S Si and i G S2, we say {i,j} is an item 
pair between Si and S2. 

We use {i,j} X T to represent that T differentiates {i,j} and {i,j} || T to 
represent that T does not differentiate We use {i,j} X T to represent 

that at least one test in T differentiates {i,j}, {i,j} || T to represent that any 
test in T does not differentiate a, and X ({i,j},T) to represent the number of 
tests in T that differentiate {i, j}. 

Fact 1. For three different items i, j and k, if \\ T and {i, k} \\ T, then 

{j,k}\\r. 

Fact 2. For three different items i, j and k, if {i,j} X T and {i, k} _L T, 
then {j, k} || T . 

Given T 1 C T, we define a binary relation ^7-' on S: for two item i, j, i ^7-/ j 
iff {i,j} || T' , By Fact 1, ^7-/ is an equivalent relation. The equivalent classes 
containing i is denoted as [ij. 

Fact 3. If T is a minimal test set, then |T| < n — 1. 

Fact 4. If T is a test set, then \T\ > log 2 n. 

Test set T with \T\ = log 2 n is called a compact test set. If T is a compact 
test set, then |S| = 2 q , q E Z + . 

In set cover problem, we are given U , the universe, and C, a collection of 
subsets of U. C is a set cover of U, that is, {J ceC — U. The objective is to find 
C C C with minimum cardinality which is also a set cover of S. 

The greedy algorithm for set cover runs like that. In each iteration, simply 
select a subset covering most uncovered elements, repeat this process until all 



elements are covered, and return the set of selected subsets. Let N be the size of 
the universe, and M* be the size of the optimal set cover. The greedy algorithm 
for set cover has performance guarantee InJV — In In AT + 0(1) by [5J. 

We give two lemmas about the greedy algorithm for set cover. Lemma 1 is a 
corollary of Lemma 2 in [6J and Lemma 2 is a corollary of Lemma 1 and Lemma 
4 in [6J. 

Lemma 1. The size of set cover returned by the greedy algorithm is at most 
M*(lniV-lnM* + l). 

Lemma 2. Given N and M* , there are instance of set cover problem such 
that the size of set cover returned by the greedy algorithm is at least [M* — 
l)(miV-hiM*). 

Test set problem can be transformed to set cover problem in a natural way. 
Let (S, T) be an instance of test set, we construct an instance (U, C) of set cover, 
where U = {{i,j}\i,j e S, i ± j}, and C = {c{T)\T e T},c(T) = {{i,j}\i e 

T,jeT~}. 

Clearly, T is a test set of S iff C = {c(T)\T E T} is a set cover of U. 
SGA can be described as: 

Input: S,T; 

Output: a test set of S; 
begin 

T <- 0; 

while #(T) > do _ 

select T in T - T minimizing #(T U {T}); 

T^fu{T}; 
cndwhilc 
return T; 

end 

In SGA, we call T the partial test set. The differentiation measure of T, 
is defined as the number of item pairs not differentiated by T. The differentiation 
measure of T (related to f) is defined as #(T, f) = #(T) - #(T U {T}). 

SGA is in fact isomorphic to the greedy algorithm for set cover under the 
natural transformation. Thus we immediately obtain the performance guarantee 
2 Inn of SGA. This paper shows better performance guarantee and a nontrivial 
lower bound of performance guarantee. The two main theorems are: 

Theorem 1. The "performance guarantee of SGA can be 1.1354 Inn. 

Theorem 2. There are arbitrarily large instances of test set problem such 
that the performance ratio of SGA on these instances is at least 1.0004471 Inn. 

The harmonious number is defined as H n := X)"=i V*- 

Two inequalities are listed here for convenience of proof in Section 3. 

Fact 5. For any < x < 1, (1 - x) x l x < 1/e. 

Denote 4>(x) := ^(Inx — 1). 

Fact 6. For any x > I, <f>(x) < 1/e 2 = 0.135 • • •. 



3 Improved Performance Guarantee 
3.1 Differentiation Distribution 

In this subsection, the author analyzes the distribution of times for which item 
pairs are differentiated in instances of test set, especially the relationship between 
the differentiation distribution and the size of the optimal test set. 

Lemma 3. Given Si, S2 Q S, Si n S2 — 0, suppose T is a test set of Si and 
a test set of S2 , then at most min(|5i|, IS2I) item pairs between Si and S2 are 
not differentiated by any test in T. 

Proof. Suppose \Si\ < |S*2|. We claim for any item i £ Si, there exist at most 
one item j in S2 satisfying {i,j} \\ T. Otherwise there exist two different items 
j, k in S2 such that {i, j} \\ T and {i, k} \\ T, then by Fact 1 , {j, k} | T, which 
contradicts T is a test set of S2 ■ □ 

Lemma 4. At most n log 2 n item pairs are differentiated by exactly one test 
in T* . 

Proof. Let B be the set of item pairs that are differentiated by exactly one test 
in T*. We prove \B\ < n\og 2 n by induction. When n = 1, \B\ = = nlog 2 n. 
Suppose the proposition holds for any n < h — l,we prove the proposition holds 
for n = h. 

Select T e T* such that T ^ and T ^ S, then \T\ < h- 1, \S-T\ < h- 1. 
Since T* is a test set of T, by induction hypothesis, at most |T| log 2 |T| item 
pairs inside of T are differentiated by exactly one test in T* . Similarly, at most 
\S — T\ log 2 \S — T\ item pairs inside of 5* — T are differentiated by exactly one 
test in T*. 

By Lemma 3, at most min(|T|, |5 — T\) item pairs between T and S — T are 
not differentiated by any test in T* - {T}. Therefore at most min(|T|, \S - T\) 
item pairs between T and S — T are differentiated by exactly one test in T* . 

W.l.o.g, suppose \T\ <\S-T\, then 

\B\ < \T\ log 2 \T\ + \S-T\ log 2 \S — T\+T 
= |T|Iog 2 (2|T|) + |S-T|log 2 |S-T| 
< |T|Iog 2 |S| + |S-T|Iog 2 |S| 
= |S|log 2 |S|. 

□ 

Lemma 5. Given S" C S' C S, suppose T is a test set of S" and a test set 
of S' — S" , then at most \S'\ log 2 item pairs between S" and S' — S" are 
differentiated by exactly one test in T. 

Proof. Let B be the set of item pairs between S" and S' — S" which are 
differentiated by exactly one test in T. We prove that \B\ < \S'\ log 2 by 
induction. When \S'\ — 1 and = 2, the lemma holds clearly. Suppose the 
lemma holds for any |S| < h — 1, h > 3, we prove the lemma holds for \S\ = h. 

Select T g T such that T ^ and T ^ 5', then \T\ < h - 1, \S' -T\ < h- 1. 
Since T - {T} is a test set of S" HT and a test set of (S' - S")nT, by induction 




Fig. 1. illustration of Lemma 5 



hypothesis, at most \T\ log 2 \T\ item pairs between S" nT and (S" - S") nT are 
differentiated by exactly one test in T. Similarly, at most |5" — T| log 2 \S' — T\ 
item pairs between S" n (5' - T) and (S" - S"') n (S' - T) are differentiated by 
exactly one test in T. 

Since T - {T} is a test set of S" n T and a test set of (5' - S") n (5' - T), 
by Lemma 3, at most min(|5" n T|, |(S" - 5") n (S" - T)|) item pairs between 
5" n T and (5' - S") n (5' - T) arc not differentiated by any test in T - {T}. 
Hence at most min(|5" n T\, \(S' - S") D (S' - T)|) item pairs between S" n T 
and (S" — S") Pi (S" — T) are differentiated by exactly one test in T. Similarly, 
at most min(|(S" - S") n T\, \S" D (S' - T)\) item pairs between (5' - 5") n T 
and S" fl (S" — T) arc differentiated by exactly one test in T 

Clearly, 

|T| > min(|S" n T\, \{S' - S") n (5' - T)|) 

+ min(|(5' - S") nT|, |S" n (S' - T)\). 

W.l.o.g, suppose \T\ < \S' - T\, then 

\B\ < \T\ log 2 \T\ + \S' - T\ log 2 \S' -T\ + \T\ 
= |T|log 2 (2|T|) + |5 , -T|Iog 2 |S'-T| 
< |T|log 2 |5'| + |5'-T|log 2 |5'| 
= |S'|log 2 |S'|. 

□ 

Lemma 6. At most n\og 2 nm* t ~ 1 item pairs are differentiated by exactly t 
test in T* , where t>2. 

Proof. Let Bt be the set of item pairs that are differentiated by exactly t test 
in T* . For any combination n of t — 1 tests in T* , let B^ be the subset of B t 
such that each item pair in B^ is differentiated by any test in ir. 

Let ^ T be the equivalent relation induced by it. For any equivalent class [i], 
there is exactly one equivalent class [j] , such that each item pair between [i] and 
[j] is differentiated by any test in ir (Fact 2). 



Since T* — tt is a test set of [i] and a test set of [j], by Lemma 5, at most 
(|['?']U[j]|) log 2 \ U [j} \ item pairs between [i] and [j] are differentiated by exactly 
one test in T* — ir. In another word, at most U [j]|) log 2 \[i] U item pairs 
between [i] and [7] are differentiated by exactly t tests in T*. Hence 

I | = £ |[»] U [j] | log 2 |[i] U < n\og 2 n. 

Therefore, 

|-Bt| < 1^1 - L"! x W°g2™ < nlog 2 nm** _1 . 

□ 

Lemma 7. most 2n log 2 nm** -1 i£em pairs are differentiated by at most 
t test in T* , where t > 2. 

Proof. Let B be the set of item pairs that are differentiated by at most t test 
in T* , and B t be the set of item pairs that are differentiated by exactly t test in 
T*. By Lemma 6, 

\B\ = + \B 2 \ + ■■■ + \B t \ 

< n log 2 n(l + m* H h to** -1 ) 

< 2nlog 2 nra ,i_1 . 

gu □ 

3.2 Proof of Theorem 1 

In this subsection, the author uses the potential function technique to derive 
improved performance guarantee of SGA for test set. Our proof is based on the 
trick to "balance" the potential function by appending a negative term to the 
differentiation measure. 

Let I = rin 4 " og 1 „ /lnTO*], then 2nlog 2 nm* / ~ 1 < (™) < 2n log 2 nm* 1 . Let 

#o = 1, #i = nlog 2 n, #j = 2nlog 2 nm**- 1 , 2 < t < I, and = n(n - l)/2. 

Let fc t = ^ In 2 < t < I + 1. 

Denote by p the probability distribution on tests in T* drawing one test 
uniformly from T* . For any T G T* , the probability of drawing T is p(T) = -Ij- . 

We divide a run of the algorithm into 7+1 phases, from Phase I + 1 to Phase 
1. In Phase t, I + 1 > t > 1, the algorithm runs until #(T) < #t-i- Let the set 
of selected tests in Phase t is %, and the partial test set when Phase t stops is 
%, 1 < t < I + 1. Then 7^ = U t < s </+i7;, 1 < * < J + 1, and the returned test 
set is T' = Ui<t</+i7t. Set T/+2 = 0. If % ^ 0, let the last selected test in 
Phase t is T{. 

In Phase t, I + 1 > t > 2, define the potential function as 

/(f) = (#(T) - ^# t _i)(l - ^) fetHMt+l1 - 



By the definition of T t+ \ and Fact 5, 

By the definition of f(f) and the facts p(T) > and EtgT« p( t ) = !> 
min/(Tu{T}) 
< ™n/(TU{T}) 



< 



2 (p(T)/(f U{T})) 



Ter* 

= - ^# t -i - £ (^(t.tdki-^h"..!-! 

Ter* 

= (#(? ) - ^#,-1 - E E ^))(i-^) fe -^-'- 1 

{i,j}||f Ter* : {i,j}±T 
and by Lemma 4 and Lemma 7, 

E E pen 

{i,j}||f TeT*:{i,j}±T 

> y — + y — 

{i,j}||T:X({*,j},T')<t-l {i,j}||T:_L({i,j},T*)>t 
^ — > £ ^ — > t 1 

m * Z-^i m * 

{i,3}\\T [i,j}\\T:±({i,j},T*)<t-l 

Therefore, 

min f(T U {T}) 

TGT 

< (#(T) - ^# t _!)(l - -^) fe '-|r-r t+1 | 

During Phase t, the algorithm selects T in T to minimize /(T U {T}). There- 
fore, /(7i - {T/}) < f(T t+1 ) < 

On the other hand, #(7^ — {T/}) > #t-i by definition of Phase t. Hence 

f{% - in}) 

t m* 

= - J_)kt-\T t -{Tl}\ 

t v m*' 



Therefore, (1 - _|_)& t -|^-{r t '}l < l, thus \% - {T{}\ < k u and \%\ < k t + 1 . 
To sum up, 

|T 2 |< k > +1 

2<t<7+l 

_ m . ( E I h J^ + E <f ) + I . 

2<t<I+l 7rt 1 2<t</+l 

When all Phase t, 1+1 > t > 2, end, consider the instance of set cover (U, C), 
where U = {a\a \\ T 2 } and C = {c{T)\c{T) nU + 0}. Clearly, |C/| < #i. Let M* 
be the size of the optimal set cover of this instance. Then \M*\ < m* . 

Consider the following two cases: (a)|M*| < m*/2; (b)|M*| > m*/2. 

In case (a), 

|Ti| <M*(In#i + l)<m*(^+o(l))Inn, 

and 

|T 2 |<m*( £ ^-^-+1^(1 + 2)) + / 

2<t<7+l "* 1 

= m*(i + o(l)) Inn. 

Hence 

\T'\ = |T 2 | + |Ti| =m*(l + o(l))lnn. 
In case (b), by Lemma 1, 

|Ti| < M*(ln#i - lnAf* + 1) = m*((l + o(l))Inn - km*), 

and 

\%\ < m*(H I+1 lnm* + i ln2 + i ln 2 (7 + 2)) + I 

= m*(ln lnm* + o(l) Inn), 

lnm* 

Hence 

\T'\ = \%\ + \T,\ < m*(l + + Inn. 

By Fact 6, 

\T'\ < m*(l. 13533 • • • + o(l)) Inn. 



4 Lower Bound 



In this section, we discuss on a variation of test set problem. Given disjoint sets 
S 1 , ■ ■ ■ ,S r andT, set of subsets of the universe S = S 1 U- • ■US"', we seek T' C T 
with minimum cardinality which is a test set of any S p for 1 < p < r. Denote 
the instance by (S p ; T). 

In our construction, we could use the split trick similar to that used in [5] 
to split S 1 , ■ ■ ■ , S r by 0(log \S\) tests. The splitting overhead could be ignored, 
provided that the size of the optimal solution is J7(|S'| C ) for some constant c. 

Let T is a compact test set of {x|l < x < 2 q , x s Z + }. For example, we can 
let T — {Tk\l < k < q}, where Tk contains integer x between 1 and 2 q such that 
the k-th bit of x's binary representation is 1. 

4.1 Level-i Instances 

Firstly, we give the level-t atom instances. Let the instance be (Sf;T). The 
universe St includes integral points in (t + l)-dimension Euclid space. 

St = U„ Sf, Sf = {(xi, ■■■,x t ,y)\l<x i < 2?}, 1 < y < % = T* U T/. 
T* = T t \ U • • • U T* t . T*. = {T*. d \l < j < 2«}. T/ = U • • • U T t ' t , T t [, = 
{T/ j .. ifc |l<i<2«- 2 ,l<fc<g}. 

Tfi j contains points in Sf with X{ = j. T' ti ^ k contains points in S{ with 
Xi in one of the q tests in T, We assign an order to tests T' ti ^ k in T/, called 
natural order, as the lexical order of k). 
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% 


o 


o 


o 


o 


o 


o 






r 


o 


o 


o 


o 


o 


o 


y 


o 


o 


o 


o 


o 


o 


o 


o 







o 


o 


o 





o 


o 


o 


o 


o 


o 





o 


o 


o 


o 







o 


o 


o 


o 


o 


o 


o 


.° 


o 


o 


o 





o 


o 


o 




s° 


o 





o 


o 


o 


o 


o, 





o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 





o 


o 


o 


o 


o 


o 





o 


o 


o 





o 





o 








o 





o 




















o 





o 





o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 



y=l y=2 



Fig. 2. atom instance with q = 3 and t = 2 



We claim SGA could return T t ' according to their natural order on atom 
instances. 

At the beginning of the algorithm, the differentiation measure of tests in T t ' 
is 2 2qt ~ 2 , and the differentiation measure of tests in T t * is 

2<?(t-i)^29* _ 2'?( t - 1 ))2 <i, ~ 2 = 2 2?t ~ 2 (l — 2~ 9 ). 

The algorithm could first select tests in T t ' 1 according to their natural order. 
After that, the differentiation measure of tests in T t ' — T t \ decreases by a factor 



2, and the differentiation measure of tests in T t * decreases by a factor at least 
2. Hence, the algorithm could subsequently select tests in T t ' 2 , • ■ • , T t ' t according 
to their natural order. 

Secondly, we construct a series of levels instances (Sf' 2 '™; %), 1 < t < J 
based on the atom instances. Atom instances are "stretched" in w dimension 
and "cloned" in z dimension. 

We select N and M* such that M* = J!2« and N = J!2«( J+1 \ where q G Z+, 
J = 2 K — 1 for some K G Z + . The universe St includes N integral points in 
(t + 3)-dimension Euclid space. 

St = U v , z , w Sf' z ' w , Sf*> w = {(x lr --,x u y,z,w)\l < Xl < 2«}, 1 < y < 
2«- 2 ,l < z < £ ,1 < w < i2«( J -*)+ 2 . T t =T*UT t '. \T*\ = M *, and \T{\ = a*f-. 
T* = T t yu- ■ -UT* t . T*. = {T t * it Jl < j < 2*-\ 1 < I < £ }. T/ = ^ X U- ■ -U7^, 
^« = < j < 2"- 2 , 1 < k < q, 1 < I < £}. 

^Ti j ! contains points in S%' ' w with Xj = j. T ( ' i • fe ; contains points in Sf' l ' w 
with Xi in one of the q tests in T. We assign an order to tests T' ti ■ k l in T t ', 
called natural order, as the lexical order of k, I). 

We claim SGA could select all tests in T t ' 1 according to their natural order 
on (St' z ' w ;Tt) in the beginning phase of the algorithm. 

At the beginning of the algorithm, the differentiation measure of tests in 
T t \ is #t e9m = and the differentiation measure of tests in T t * is 

2<z'(*-i)(i - 2~i)N. 

The algorithm could select tests in T t \ according to their natural order while 
the differentiation measure of selected test is kept equal to the differentiation 
measures of tests in i for 2 < i < t and no less than the differentiation measure 
of any test in T t * . 

When the algorithm select the last test in T t '±, its differentiation measure is 
^end = 2 . 2^ l ^N. 

4.2 Proof of Theorem 2 

Let (U,C) be the instance in Lemma 2, U = {e±, ■ ■ ■ , ejv}, C* be the optimal 
set cover, and C be the set cover returned by the greedy algorithm. Construct 
instance of test set (S%;T ). S% = {e p ,f p }, 1 < p < N, S = \Ji< p <n s o- 
%=C*UC'. 

On (Sq; To), the algorithm could select all the tests in T Q ', the the differenti- 
ation measure of selected tests ranges from ^ e9m = N/M* to = 1 by the 
proof of Lemma 1 in [3] . 

Consequently, we construct a series of level-t instances (S^' z ' w ;T t ) 7 1 < t < 
J , and combine them and (Sq]Tq) into the complete instance. We intend to 
prove the performance ratio of SGA on this instance is at least (f + 7^1(3^72 ~ 
f) — o(l)) Inn, for fixed J. When J = 511, this performance ratio is at least 
1.0004471 Inn. 

Let S = U t J =0 s t, T = r * U T. Then n = ( J + 1)N. We join tests in T* for 
< t < J one-by-one to obtain one test set T*. Suppose %* — {T t * l5 • • ■ , T* M „}, 



< t < J, then T* = {Tq*! U • • • U T* 1; • • • , T* M , U • • • U Tj M » }. Clearly, T* is 
an optimal test set, and m* — M*. 

We modify tests in T t \ by two operations: Enlargement and Merging. In the 
Enlargement operation, tests in T t \ are enlarged by a factor 2. In the Merging 
operation, tests in 7^ for i > 2 are merged to tests in T s l for i — 1 > s > 1 . Let 

T' = T ' U U/=i ^t'i after the two operations are performed. 

Enlargement. Let T{ A = {T[ lj kl \l < j < 2«~ 3 ,1 < k < q, 1 < I < 
Tt.i.j,k,i contains points in S^ -1 ' ' w and S^' 1 '™ with x\ in one of the q tests in 
f. As a result, |T^| = ^Ml. 

Merging. By the decreasing order of t for J > t > 2, merge tests in T/ — T t \ 
by their natural order one-by-one to tests in for i — 1 > s > 1 by the 
decreasing order of s (primarily) and their natural order in T sX until tests in 
T t ' — T t \ are exhausted. 

For any J > t > 2, tests in T s l suffice in the Merging operation, provided 
that 

To finish the proof, we analysis the behavior of SGA on the complete instance. 
Before the algorithm selects a test, let # t be the maximum differentiation mea- 
sure of tests in T t \ for J > t > 1, and to the maximum differentiation 
measure of tests in T* . 

For 1 < s < t, let # t ,s be the number of item pairs inside of S^ z ' w con- 
tributing to # 4 , #t_i, s be the number of item pairs inside of S^' z ' w contributing 
to #t-i, #* be the number of item pairs inside of S%' z ' w contributing to 
and #q be the number of item pairs inside of Sfi contributing to #*. Then 

= Ss=* #M> #t-i = #M> and #* = X]s=o #«• 

Since # M > #t-i,s for s > i and 

#t,t = 2#t_i,i > #t-l,t + #t" d > #t-l,t + #Jf?i™ > #t-l,i + #t-l,t-lj 

it follows that # t > #t-i- Hence # t > # s , for any 1 < s < t. 
Since # tjS > #* for s > i and 

t-i t-i 

#m > #: + #r d > #: + 2#jir > #: + E #' e9m > #? + 

s=0 s=0 

it follows that # t > #*. 

We conclude the algorithm could select all tests in T t ' x in their natural order, 
for J > t > 1, and select all tests in T ', finally return T'. 

In the condition J is fixed, the size of returned test set is 

\T'\ > (M* - l)(lniV - InM*) + ^ Hj 

8 

-^ 1 + 7TT ( 8h72- 1) -° (1))ln - 



5 Discussion 



The author notes this is the first time to distinguish precisely the worst case 
performance guarantees of two types of " greedy algorithms" implemented by set 
cover criterion and by information criterion. In fact, the author definitely shows 
the pattern of instances on which ICH performs better than SGA. 

In a preceding paper [7], we proved the performance guarantee of SGA can 
be (1.5 + o(l)) Inn, and the proof can be extended to weighted case, where each 
test is assigned a positive weight, and the objective is modified as to find a test 
set with minimum total weight. 

In the minimum cost probe set problem[8] of bioinformatics, tests are re- 
placed with partitions of items. The objective is to find a set of partitions with 
smallest cardinality to differentiate all item pairs. It is easily observed that the 
improved performance guarantee in this paper is still applicable to this general- 
ized case. 

Acknowledgements. The author would like to thank Tao Jiang and Tian Liu 
for their helpful comments. 
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